Online-offline activities and game-playing behaviors of avatars in 
a massive multiplayer online role-playing game 
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Abstract. - Massive multiplayer online role-playing games (MMORPGs) are very popular in 
China, which provides a potential platform for scientific research. We study the online-offline 
activities of avatars in an MMORPG to understand their game-playing behavior. The statistical 
analysis unveils that the active avatars can be classified into three types. The avatars of the 
first type are owned by game cheaters who go online and offline in preset time intervals with the 
online duration distributions dominated by pulses. The second type of avatars is characterized 
by a Weibull distribution in the online durations, which is confirmed by statistical tests. The 
distributions of online durations of the remaining individual avatars differ from the above two 
types and cannot be described by a simple form. These findings have potential applications in the 
game industry. 



Introduction. — According to the Statistical Reports 
on the Internet Development in China released by China 
Internet Network Information Center, the past twelve 
years have witnessed a sharp increase in the number of 
Chinese netizens from 0.63 million on 31 October 1997 
to 338 million on 16 July 2009. Till June 2009, the 
size of netizens playing massive multiplayer online games 
(MMOGs) is 78.55 million. The MMOGs in mainland 
China include two types, i.e., the massive multiplayer on- 
line role-playing game (MMORPG) and the large-scale ca- 
sual game, both having about 49 million users. An MMOG 
is an online virtual world, where avatars can live and in- 
teract with one another in a somewhat realistic manner. 
The huge number of users in MMOGs has raised many 
open academic problems and attracted vast interest of aca- 
demics from diverse angles of view, especially since the pi- 
oneering work done by Edward Castronova, who traveled 
in a virtual world called "Norrath" and performed a pre- 
liminary analysis of its economy [1] . Particularly, virtual 
worlds have great potential for research in social, behav- 
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ioral, and economic sciences [2]. 

The outbreak of SARS virus in 2003 and the recent 
globally spread swine flu forces scientists to understand 
the epidemics of infectious diseases. A lot of epidemic 
models have been proposed [3]. Although there are sev- 
eral exceptions [4,5], the limited availability of empirical 
data of human mobility remains a crucial challenge [6]. 
To partly overcome this difficulty, we can design a kind of 
virus in a virtual world and let it spread to investigate its 
epidemics. For other applications, we can design some eco- 
nomic games in a virtual world to study the formation of 
human cooperation (indeed, numerical experiments have 
been done [7]), and we can record the economic behaviors 
of avatars to understand the evolution of wealth distribu- 
tion. There are also efforts in the field of computational 
social sciences from a complex network perspective [8-13]. 
In addition to its scientific potentials, virtual worlds could 
act as nice places for real social activities, such as market- 
ing [14-16], and provide opportunities for players to make 
real money [17]. 

In this Letter, we investigate the online-offline activi- 
ties and game-playing behaviors of the avatars inhabiting 
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a server of a massive multiplayer online role-playing game 
operated by Shanghai Shanda Interactive Entertainment 
Ltd, which is the leader of China's MMORPG industry 
and runs dozens of online games. We will show that the 
statistical properties of the online-offline activities of in- 
dividual avatars allow us to classify avatars and identify 
game cheaters. 

Description and preprocessing of the data. 

Our data are online-offline logs recorded during the time 
period from 1 September 2007 to 31 October 2007 of an 
MMORPG server run by Shanda Interactive Entertain- 
ment Ltd. There is one log file for each day. Each entry 
contains three pieces of information: the masked avatar 
ID, its login time, and its logout time. The resolution of 
the time stamps is 1 second. During the recording time 
period, there were 19843 avatars who entered the game. 
For security sake, the true avatar IDs have been encrypted 
into numbers from 1 to 19843. 

An entry is written to the log file when an avatar goes 
offline. Therefore, the entries in a log file are arranged 
according to an increasing order of logoff moments. For 
each avatar, we collect all the associated entries, whose 
login and logoff times form a two-dimensional array E mX 2 
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where t° n and t° s are the logon and logoff times of the i-th 
game-playing session of the avatar during the time period 
from 1 September 2007 to 31 October 2007. In the usual 
situation, we have 
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which is illustrated in fig. 1. 
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Fig. 1: Schematic chart of game sessions for an individual 
avatar and the definition of online durations. 

We can calculate the time interval t, between the logon 
time t° n and logoff time t° of the i-th game session that an 
avatar played during the time period under investigation, 
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which is termed as online duration of the i-th game session. 
We can also calculate the offline duration between two 
successive game sessions of a same avatar as follows 
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which measures how long it takes for an avatar to logon 
the game again after he/she exits the game. 

Assume that the sequence sizes of online and offline du- 



rations of avatar j are n° n and n° 

J 3 3 

each t° n is followed by t° s , we have 
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We have calculated the online and offline duration se- 
quences of all the 19843 avatars and find that A on = 
14,393,332 and N oS = 14,373,489, which is consistent 
with Eq. (6). On average, each avatar plays about 12 ses- 
sions each day. 

Preprocessing the data is necessary. We find that there 
are 41,845 offline durations (about 0.3% of the total sam- 
ple) that are negative, which can be attributed to record- 
ing errors introduced by the system. There are also 
1,221,811 offline durations (about 8.5% of the total sam- 
ple) that equal to zero. The observation of rj = is nothing 
but a consequence of the data recording rule that the log 
file will record the action that one avatar enters map B 
from map A as an offline-online activity. For the above 
cases, we adopt the strategy of removing the offline entry 
by merging the two entries {t° n ,tf} and into 
one {t° n ,t°+i}- ft is possible that the offline duration as- 
sociated with an inter-map transfer of an avatar is greater 
than if there is a heavy network traffic. For the on- 
line durations, all r values are nonnegative and there are 
52,442 online durations (about 0.4% of the total sample) 
that are equal to 0. The online durations with r = are 
excluded from further analysis. 

Collective behaviors. — The instant number of on- 
line avatars per second can be constructed according to 
the online-offline data, whose statistical properties have 
been investigated [18]. It was found that the online avatar 
number exhibits one-day periodic behavior and clear in- 
traday pattern, the fluctuation distribution of the on- 
line avatar numbers has a leptokurtic non-Gaussian shape 
with power-law tails, the increments of online avatar num- 
bers after removing the intraday pattern are uncorrelated 
and the associated absolute values have long-term correla- 
tion, and both time series exhibit multifractal nature [18]. 
These properties are relevant to the traffic of the server 
and the profit of the MMORPG company. 

In this section, we will investigate the collective behav- 
iors of individual avatars based on their gaming activities. 
Three quantities are studied. For each player, we define 
two quantities, one is total online times m and the other 
is total online session duration T, and then take the whole 
population as a sample to make a description of the col- 
lective activities. 
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Distribution of the number of gaming sessions of indi- 
vidual avatars. For each avatar, we count the number 
m of gaming sessions that he/she played during the two- 
month time period under investigation. The sequence has 
f 9843 data points. The empirical probability density func- 
tion p(m) of individual gaming session number m is illus- 
trated in fig. 2. One can observe that there is a power-law 
behavior between p(m) and m: 
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where the power-law exponent can be approximatively ob- 
tained by the following equation based on the maximal 
likelihood estimation [19], 
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where N m is the number of m that are no less than m mm . 
By setting m min = 1, Eq. (8) gives that the tail exponent 
a m = 0.39. The Kolmogorov-Smirnov test confirms that 
the distribution can model the data with high statistical 
significance. 
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Fig. 3: Evolution of the daily number of game sessions played 
by two typical avatars 4636 (right axis) and 16577 (left axis). 



71.6% and 16.1% of the number of online durations. The 
inset shows the associated r sequence. A clear change of 
cheating behavior from r « 20 to t « 28 is observed, 
which happened on 14 October 2007. 
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Fig. 4: Occurrence 0(t) of the online duration r for avatar 
4636. The inset shows the associated r sequence. 



Fig. 2: Empirical probability density function p(m) of the num- 
ber of sessions m of 19843 individual avatars. 

The very small value of a m indicates that the decay of 
the distribution is very slow. The daily online number 
is no more than 10 for 97.1% of the avatars and is no 
more than 1 for 85.2% of the avatars. In addition, we 
notice that the fluctuation at the tail of the distribution 
p(m) is high and the occurrence of large m values seems 
to be greater than the prediction of the p(m) function. 
The maximal value of the m sequence is 187812 (Avatar 
ID: 4636), which means that the avatar went online and 
offline 128.3 times per hour! The evolution of the daily 
number m(t) of game sessions played by this avatar 4636 
is illustrated in the right axis of fig. 3. We also show in 
the left axis the evolution of daily number m(t) of game 
sessions for avatar 16577 for comparison. 

Figure 4 shows the occurrence O(t) of the online du- 
ration r for avatar 4636. There are two spikes in fig. 4 
located at around r = 20 and 28. We observe that 
O(20) = 134544 and 0(28) = 30302, which amounts to 



Distribution of the total time spent by individual 
avatars. An important measure of the avatar game- 
playing behavior is the total time he/she spends, which 
can be calculated as follows, 



T; 



i=l 
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which is the sum of all session durations of avatar j. The 
size of the Tj series is 19843. The maximal total time 
is 1142 hours (Avatar ID: 4636), which means that the 
avatar was active in the game 18.7 hours per day. 

Figure 5 depicts the probability density function p(T) 
of the total time T for the whole population. One can 
observe that there is a power-law behavior in the tail of 
p(T): 



p(T) « t~ (qt+1) , for T^T n 



(10) 



The tail exponent olt can also be determined by maximal 
likelihood estimation using Eq. (8), where the argument 
m is replaced by T. By setting T min = 500, Eq. (8) gives 
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that the tail exponent olt = 0.35. It is interesting to note 
that the tail exponent ctx of the total time Tj is very close 
to the power-law exponent of the session number rrij . 
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Fig. 5: Empirical probability density function p(T) of the total 
game-playing time T of 19843 individual avatars. 

Distribution of the durations of individual sessions. 
We put all the online durations r, of all the avatars to- 
gether as a whole sample and investigate its distribution. 
The size of the whole sample is 13,092,371. Figure 6 shows 
the empirical distribution density /(t) of the online dura- 
tions r in log-log scales. The most striking feature of fig. 6 
is the occurrence of many spikes, which locate at r = 2, 12, 
20, 25, 28, 44, 71, 87, 300, 505, 600, 614, 1200, 1500, 1800, 
2000, 2411, 3000, 3600, 5000, and 10000. These spikes are 
outliers that are markedly greater than the normal level. 
For some of the spikes, its neighbors are also greater than 
the normal level. These spikes indicate the abnormal be- 
havior of some players, which are usually related to game 
cheaters. This observation can be used to identify game 
cheaters. 




Fig. 6: Empirical distribution density p(r) of the online dura- 
tions r. The spikes locate at r = 2, 12, 20, 25, 28, 44, 71, 87, 
300, 505, 600, 614, 1200, 1500, 1800, 2000, 2411, 3000, 3600, 
5000, and 10000. 

Consider the spike at r = 5000. There are 466 game 
sessions with r = 5000. We find that there are 15 avatars 
(IDs: 339, 3797, 5542, 5954, 6418, 6886, 7044, 7767, 10217, 
11436, 15611, 15613, 17733, 18075, 18246) whose online 



duration sequences have at least one point being t = 5000. 
The occurrence of t = 5000 is 1 for all the avatars ex- 
cept for avatars 15611 and 15613, whose occurrences are 
233 and 220, respectively. Figure 7 shows the occurrence 
O(r) of the online duration r for avatar 15611. We find 
that there are two spikes in fig. 7 located at r = 3600 
and 5000, whose occurrences are O(r) = 137 and 233. 
We also observe that 0(3601) = 60 and 0(5001) = 165. 
Note that the size of the online duration sequence of this 
avatar is 1115. Hence the proportion of the occurrence of 
these four r values is 53.36%. The inset shows the asso- 
ciated t sequence. A clear change of cheating behavior 
from t = 5000 to r = 3600 is observed, which happend on 
10 October 2007. For avatar 15613, very similar behav- 
ior is observed and a change of cheating from t = 5000 
to t = 3600 happened on 12 October 2007. The striking 
similarity of the behavior of the two avatars implies that 
their host players might be closely related. 
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Fig. 7: Occurrence O(r) of the online duration r for avatar 
15611. The inset shows the associated r sequence. 



Online duration distributions for individual 
avatars. — Now we turn to study the online-offline be- 
haviors of individual avatars, which are of potential in- 
terest and ultra importance in the identification of game 
cheaters, the detection of server traffic, the understanding 
of the game-playing patterns of players, and the design 
and improvement of online games. 

Owning to the consideration of commercial applications 
and statistics of the results, we are more interested in ac- 
tive avatars when investigating their game-playing pat- 
terns at the level of individual avatars. There arc nu- 
merous avatars whose total numbers m of online sessions 
are small. For instance, the proportions of avatars with 
m ^ 1, m ^ 2, m ^ 10, m ^ 50 and m 100 arc 27.8%, 
43.2%, 66.4%, 83.6% and 88.9%, respectively. Although 
an avatar with m = 50 is not inactive, it is hard to con- 
struct its empirical distribution p(r) with sufficient statis- 
tics. In addition, according to the 7th Online Game Re- 
search Report (2007) and the 8th Online Game Industry 
Research Report (2008) , about 92% players spent more 
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than one hour in playing online games every day. Com- 
bining these two facts, we exclude from our analysis the 
avatars who were online for no more than 30 days or whose 
daily cumulative online durations were less than half an 
hour. This results in 947 avatars remaining. 

As shown in the previous section, especially in fig. 4 
and fig. 7, there are bursts or pulses in the histogram of 
the occurrence of some fixed online durations r. These 
avatars are impossible to be operated by humans, rather, 
they are controlled by some robots, whose host players are 
game cheaters. According to the regular behavior of the 
program-controlled avatars, we filter out 258 robot avatars 
that were too active from the entire population. Finally, 
there are 689 avatars remaining for further analysis. 

Weibull distributions. In order to check if these active 
avatars share the same online-offline behavior, we deter- 
mine the empirical complementary cumulative distribu- 
tion C(t) of each avatar. Our eye-balling gives us the 
impression that most distributions have fat tails, which 
could be modeled by the Weibull distribution [20,21] 



C(r) = exp [-(T/T f 



(11) 



where To is the characteristic time, and b < 1 is the expo- 
nent. It follows immediately that 



ln[l/C(r)] = (r/r ) 6 



(12) 



which means that ln[l/C(r)] scales as a power law with 
respect to r. Figure 8 shows the dependence of In [1 /C(r)] 
as a function of r for three avatars. All the three curves 
exhibit power laws with the scaling ranges spanning about 
three orders of magnitude, which is the graphic evidence 
that the distribution of the online durations for individual 
avatars of this type is Weibull. 



10" 



Vio" 

o 

T— I 

^10" 



10" 



J 




.^^^ 






o 6 


A^ 


o 2368 


A 


a 4794 



10" 



10' 



10" 



T 



10 



io H 



10' 



Fig. 8: Dependence of In [1/C(r)] as a power-law function of r 
for three typical avatars (IDs: 6, 2368, 4794). 

In order to identify the avatars whose online durations 
conform to the Weibull distribution, we design an ap- 
proach to classify the avatars based on statistical tests. 
For each avatar, its empirical distribution of online dura- 
tions is fitted to a Weibull formula by means of the maxi- 
mum likelihood estimation (MLE) method. The fitted for- 
mula is then converted to its cumulative form F(t). We 



then investigate whether the sample of online durations is 
drawn from the "theoretical" distribution F(t) from the 
best MLE fit. The null model is that the data can be 
modeled by a Weibull distribution. We can perform the 
Kolmogorov-Smirnov (KS) test [22, 23] for this purpose. 
The Kolmogorov-Smirnov statistic (KS statistic), which 
measures the distance between the empirical cumulative 
distribution function of the sample and the cumulative 
distribution function of the best fit, is defined as 



KS 



F\), 



(13) 



where F omp is the cumulative distribution function of 
the empirical sample and F is the cumulative distribu- 
tion function from the best MLE fit. Alternatively, the 
Cramer- von Mises criterion can also be used for judging 
the goodness-of-fit of the probability distribution com- 
pared with a given distribution [24] , which is given by 
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In one-sample applications, the function can be described 
as follows [25,26], 
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where n is the sample size. If the KS (or CvM) statistic 
is less than a critical value, the null hypothesis cannot be 
rejected. 

At the significant level of 1%, we find that there are 489 
avatars whose online durations can be well modeled by the 
Weibull distribution. Figure 9 presents the histogram of 
the fitted exponent b for the 489 avatars. There is one 
value of b (ID: 5483) that is greater than f , which corre- 
sponds to a sub-exponential distribution decaying faster 
than exponential. We find that the distribution is mono- 
modal and b = 0.68 ± 0.f2. 



80r 




Fig. 9: Histogram of the fitted exponent b for the 489 avatars. 

Other distributions. For the avatars whose online du- 
rations do not follow Weibull distributions, we cannot find 
a simple form for the online duration distribution. Figure 
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10 illustrates the survival distributions of r for three typ- 
ical avatars in log-log scales. It seems that the first-order 
derivative is not continuous for avatars 13755 and 18096, 
since there are clear kinks in the C(r) curves. For avatar 
19750, the C{t) curve looks like a Weibull truncated with 
a power-law tail. However, statistical tests shows that it 
is neither a Weibull distribution nor a power-law tailed 
distribution. The inset of fig. 10 shows correspondingly 
the curves of ln[l/C(r)] with respect to r for the three 
avatars. No evident power-law regime is observed in the 
three curves, which confirms that the online durations of 
these avatars do not follow Weibull distributions. 
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Fig. 10: Survival distributions C(t) of the online durations r 
for three typical avatars (IDs: 13755, 18096, 19750). The inset 
shows correspondingly the plots of In [1/C(t)] versus r. 



Conclusion. — In summary, we have studies the 
online-offline activities and game-playing behaviors of 
avatars in a massive multiplayer online role-playing game 
based on the log files recorded during the time period from 
1 September 2007 to 31 October 2007. We found that the 
number of game sessions and total time of online dura- 
tions of individual avatars are distributed according to a 
power law, with large bursts in both tails. In addition, 
the distribution of the online durations of all avatars as 
a whole sample is decorated by sharp spikes. These phe- 
nomena are signals of game cheaters who used robots to 
control their avatars, which can be identified by the ab- 
normal pulses in the distribution of online durations for 
individual avatars. In addition, we also found that there 
are a group of normal avatars whose online durations are 
distributed as Weibulls. These findings have potential ap- 
plications in the online game industry. 

Our finding that the online durations of many normal 
avatars are distributed according to a Weibull distribu- 
tion adds new evidence that human dynamics is not a 
simple Poisson process [27]. However, the Weibull behav- 
ior cannot be explained by existing models based on pri- 
ority queue [27], cascading nonhomogencous Poisson pro- 
cess [28], or adaptive interest [29]. 
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